AITopics | theorem vi

Collaborating Authors

theorem vi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gradient Descent Finds Over-Parameterized Neural Networks with Sharp Generalization for Nonparametric Regression

Yang, Yingzhen, Li, Ping

arXiv.org Machine LearningDec-9-2024

We study nonparametric regression by an over-parameterized two-layer neural network trained by gradient descent (GD) in this paper. We show that, if the neural network is trained by GD with early stopping, then the trained network renders a sharp rate of the nonparametric regression risk of $\cO(\eps_n^2)$, which is the same rate as that for the classical kernel regression trained by GD with early stopping, where $\eps_n$ is the critical population rate of the Neural Tangent Kernel (NTK) associated with the network and $n$ is the size of the training data. It is remarked that our result does not require distributional assumptions about the covariate as long as the covariate is bounded, in a strong contrast with many existing results which rely on specific distributions of the covariates such as the spherical uniform data distribution or distributions satisfying certain restrictive conditions. The rate $\cO(\eps_n^2)$ is known to be minimax optimal for specific cases, such as the case that the NTK has a polynomial eigenvalue decay rate which happens under certain distributional assumptions on the covariates. Our result formally fills the gap between training a classical kernel regression model and training an over-parameterized but finite-width neural network by GD for nonparametric regression without distributional assumptions on the bounded covariate. We also provide confirmative answers to certain open questions or address particular concerns in the literature of training over-parameterized neural networks by GD with early stopping for nonparametric regression, including the characterization of the stopping time, the lower bound for the network width, and the constant learning rate used in GD.

neural network, probability, theorem vi, (15 more...)

arXiv.org Machine Learning

2411.02904

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Austria (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
(5 more...)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.70)

Add feedback

Robust Score-Based Quickest Change Detection

Moushegian, Sean, Wu, Suya, Diao, Enmao, Ding, Jie, Banerjee, Taposh, Tarokh, Vahid

arXiv.org Machine LearningJul-14-2024

Methods in the field of quickest change detection rapidly detect in real-time a change in the data-generating distribution of an online data stream. Existing methods have been able to detect this change point when the densities of the pre-and post-change distributions are known. Recent work has extended these results to the case where the pre-and post-change distributions are known only by their score functions. This work considers the case where the pre-and post-change score functions are known only to correspond to distributions in two disjoint sets. This work employs a pair of "least-favorable" distributions to robustify the existing score-based quickest change detection algorithm, the properties of which are studied. This paper calculates the leastfavorable distributions for specific model classes and provides methods of estimating the least-favorable distributions for common constructions. Simulation results are provided demonstrating the performance of our robust change detection algorithm. N the fields of sensor networks, cyber-physical systems, biology, and neuroscience, the statistical properties of online data streams can suddenly change in response to some application-specific event ([1]-[4]).

algorithm, rscusum, rscusum algorithm, (13 more...)

arXiv.org Machine Learning

2407.11094

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > North Carolina > Durham County > Durham (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback